Last updated: 2021-03-09

Checks: 7 0

Knit directory: Assignment/

This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20210309) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 42eec2d. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Untracked files:
    Untracked:  Assignment/
    Untracked:  analysis/finalresults.Rmd
    Untracked:  analysis/myproject/
    Untracked:  myproject/

Unstaged changes:
    Modified:   finalresults.Rmd

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/first-analysis.Rmd) and HTML (docs/first-analysis.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
Rmd 42eec2d Defnecob 2021-03-09 wflow_publish(“~/Desktop/R Kurs/Assignment/analysis/first-analysis.Rmd”)

Based on the exercises from Rajbir: here

  1. Use the hfi dataset from the openintro package

  2. Fit a new model that uses pf_expression_control to predict hf_score, or the total human freedom score.

  3. Again add region to the model from Q4. Compare the slope and \(R^2\) with the model from Q4.

  4. Finally fit a model with ef_money and pf_expression_control as exposures and hf_score as outcome. Compare the slope and \(R^2\) from the models from Q1. Could ef_money be a confounder?

  5. Use a linear regression model (and scatter plot) with ef_money as exposure and pf_expression_control as outcome, to study whether ef_money has an association with pf_expression_control as well. This might validate our finding that ef_money is a confounder between pf_expression_control as exposure and hf_score as outcome from Q6.

Load packages

install.packages("openintro")

The downloaded binary packages are in
    /var/folders/3d/v7t2sdj52731h3r72t57tsf00000gn/T//RtmpG1sY9f/downloaded_packages
library(openintro)
library(broom)
library(tidyverse)

The data

Exercise 7:

hfi_2016 <- hfi %>% 
  filter(year == 2016)

head(hfi_2016)
# A tibble: 6 x 123
   year ISO_code countries region  pf_rol_procedur… pf_rol_civil pf_rol_criminal
  <dbl> <chr>    <chr>     <chr>              <dbl>        <dbl>           <dbl>
1  2016 ALB      Albania   Easter…             6.66         4.55            4.67
2  2016 DZA      Algeria   Middle…            NA           NA              NA   
3  2016 AGO      Angola    Sub-Sa…            NA           NA              NA   
4  2016 ARG      Argentina Latin …             7.10         5.79            4.34
5  2016 ARM      Armenia   Caucas…            NA           NA              NA   
6  2016 AUS      Australia Oceania             8.44         7.53            7.36
# … with 116 more variables: pf_rol <dbl>, pf_ss_homicide <dbl>,
#   pf_ss_disappearances_disap <dbl>, pf_ss_disappearances_violent <dbl>,
#   pf_ss_disappearances_organized <dbl>,
#   pf_ss_disappearances_fatalities <dbl>, pf_ss_disappearances_injuries <dbl>,
#   pf_ss_disappearances <dbl>, pf_ss_women_fgm <dbl>,
#   pf_ss_women_missing <dbl>, pf_ss_women_inheritance_widows <dbl>,
#   pf_ss_women_inheritance_daughters <dbl>, pf_ss_women_inheritance <dbl>,
#   pf_ss_women <dbl>, pf_ss <dbl>, pf_movement_domestic <dbl>,
#   pf_movement_foreign <dbl>, pf_movement_women <dbl>, pf_movement <dbl>,
#   pf_religion_estop_establish <dbl>, pf_religion_estop_operate <dbl>,
#   pf_religion_estop <dbl>, pf_religion_harassment <dbl>,
#   pf_religion_restrictions <dbl>, pf_religion <dbl>,
#   pf_association_association <dbl>, pf_association_assembly <dbl>,
#   pf_association_political_establish <dbl>,
#   pf_association_political_operate <dbl>, pf_association_political <dbl>,
#   pf_association_prof_establish <dbl>, pf_association_prof_operate <dbl>,
#   pf_association_prof <dbl>, pf_association_sport_establish <dbl>,
#   pf_association_sport_operate <dbl>, pf_association_sport <dbl>,
#   pf_association <dbl>, pf_expression_killed <dbl>,
#   pf_expression_jailed <dbl>, pf_expression_influence <dbl>,
#   pf_expression_control <dbl>, pf_expression_cable <dbl>,
#   pf_expression_newspapers <dbl>, pf_expression_internet <dbl>,
#   pf_expression <dbl>, pf_identity_legal <dbl>,
#   pf_identity_parental_marriage <dbl>, pf_identity_parental_divorce <dbl>,
#   pf_identity_parental <dbl>, pf_identity_sex_male <dbl>,
#   pf_identity_sex_female <dbl>, pf_identity_sex <dbl>,
#   pf_identity_divorce <dbl>, pf_identity <dbl>, pf_score <dbl>,
#   pf_rank <dbl>, ef_government_consumption <dbl>,
#   ef_government_transfers <dbl>, ef_government_enterprises <dbl>,
#   ef_government_tax_income <dbl>, ef_government_tax_payroll <dbl>,
#   ef_government_tax <dbl>, ef_government <dbl>, ef_legal_judicial <dbl>,
#   ef_legal_courts <dbl>, ef_legal_protection <dbl>, ef_legal_military <dbl>,
#   ef_legal_integrity <dbl>, ef_legal_enforcement <dbl>,
#   ef_legal_restrictions <dbl>, ef_legal_police <dbl>, ef_legal_crime <dbl>,
#   ef_legal_gender <dbl>, ef_legal <dbl>, ef_money_growth <dbl>,
#   ef_money_sd <dbl>, ef_money_inflation <dbl>, ef_money_currency <dbl>,
#   ef_money <dbl>, ef_trade_tariffs_revenue <dbl>,
#   ef_trade_tariffs_mean <dbl>, ef_trade_tariffs_sd <dbl>,
#   ef_trade_tariffs <dbl>, ef_trade_regulatory_nontariff <dbl>,
#   ef_trade_regulatory_compliance <dbl>, ef_trade_regulatory <dbl>,
#   ef_trade_black <dbl>, ef_trade_movement_foreign <dbl>,
#   ef_trade_movement_capital <dbl>, ef_trade_movement_visit <dbl>,
#   ef_trade_movement <dbl>, ef_trade <dbl>,
#   ef_regulation_credit_ownership <dbl>, ef_regulation_credit_private <dbl>,
#   ef_regulation_credit_interest <dbl>, ef_regulation_credit <dbl>,
#   ef_regulation_labor_minwage <dbl>, ef_regulation_labor_firing <dbl>,
#   ef_regulation_labor_bargain <dbl>, ef_regulation_labor_hours <dbl>, …

Exercise 8:

model1 <- lm(hf_score ~ pf_expression_control, data = hfi_2016)

tidy(model1) 
# A tibble: 2 x 5
  term                  estimate std.error statistic  p.value
  <chr>                    <dbl>     <dbl>     <dbl>    <dbl>
1 (Intercept)              5.05     0.123       41.1 5.97e-87
2 pf_expression_control    0.368    0.0224      16.5 2.73e-36
summarise(hfi_2016, correlation_hf_pf = cor(pf_expression_control, hf_score))
# A tibble: 1 x 1
  correlation_hf_pf
              <dbl>
1             0.793
#Tells us that the correlation between expression_control and human freedom is 0.7931748
#And that the intercept estimate is 5.053396 and the slope is 0.368432, this suggests that for every increase in pf_expression_control per unit, the human freedom increases by 0.368432
slope1 <- model1$coefficients["pf_expression_control"]
slope1
pf_expression_control 
             0.368432 

Exercise 9:

model2<- lm(hf_score ~ pf_expression_control + region, data = hfi_2016)

tidy(model2) 
# A tibble: 11 x 5
   term                                estimate std.error statistic  p.value
   <chr>                                  <dbl>     <dbl>     <dbl>    <dbl>
 1 (Intercept)                           6.05      0.235     25.7   5.19e-57
 2 pf_expression_control                 0.275     0.0258    10.6   4.80e-20
 3 regionEast Asia                       0.241     0.328      0.732 4.65e- 1
 4 regionEastern Europe                 -0.102     0.267     -0.381 7.04e- 1
 5 regionLatin America & the Caribbean  -0.567     0.260     -2.18  3.08e- 2
 6 regionMiddle East & North Africa     -1.06      0.259     -4.10  6.80e- 5
 7 regionNorth America                   0.330     0.469      0.704 4.82e- 1
 8 regionOceania                        -0.131     0.375     -0.350 7.27e- 1
 9 regionSouth Asia                     -0.627     0.264     -2.37  1.88e- 2
10 regionSub-Saharan Africa             -0.972     0.245     -3.97  1.12e- 4
11 regionWestern Europe                  0.0487    0.298      0.163 8.71e- 1

the slope between human freedom and expression control decreased to 0.27471091 in comparison to 0.368432 in the previous model without regions, so that the finluence of pf_expression control also decreased

slope2 <- model2$coefficients["pf_expression_control"]
slope2
pf_expression_control 
            0.2747109 
glance(model1) 
# A tibble: 1 x 12
  r.squared adj.r.squared sigma statistic  p.value    df logLik   AIC   BIC
      <dbl>         <dbl> <dbl>     <dbl>    <dbl> <dbl>  <dbl> <dbl> <dbl>
1     0.629         0.627 0.660      271. 2.73e-36     1  -161.  329.  338.
# … with 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>
rsquared1 <- summary(model1)$r.squared

rsquared1 
[1] 0.6291262
glance(model2) 
# A tibble: 1 x 12
  r.squared adj.r.squared sigma statistic  p.value    df logLik   AIC   BIC
      <dbl>         <dbl> <dbl>     <dbl>    <dbl> <dbl>  <dbl> <dbl> <dbl>
1     0.754         0.738 0.552      46.4 4.52e-41    10  -128.  280.  317.
# … with 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>
rsquared2 <- summary(model2)$r.squared

rsquared2
[1] 0.7544951

the r.squared with the model1 is 0.6291262 and with the model2 including region is 0.7544951 so the r.squared value is increasing in the second model suggesting more variability in the hf score

Exercsie 10:

model3 <- lm(hf_score ~ ef_money, data = hfi_2016)
model4<- lm(hf_score ~ pf_expression_control + ef_money, data = hfi_2016)
tidy(model3)
# A tibble: 2 x 5
  term        estimate std.error statistic  p.value
  <chr>          <dbl>     <dbl>     <dbl>    <dbl>
1 (Intercept)    2.74     0.387       7.07 4.56e-11
2 ef_money       0.504    0.0463     10.9  5.50e-21
glance(model3)
# A tibble: 1 x 12
  r.squared adj.r.squared sigma statistic  p.value    df logLik   AIC   BIC
      <dbl>         <dbl> <dbl>     <dbl>    <dbl> <dbl>  <dbl> <dbl> <dbl>
1     0.425         0.422 0.821      118. 5.50e-21     1  -197.  400.  409.
# … with 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>
tidy(model4)
# A tibble: 3 x 5
  term                  estimate std.error statistic  p.value
  <chr>                    <dbl>     <dbl>     <dbl>    <dbl>
1 (Intercept)              2.79     0.244       11.5 1.38e-22
2 pf_expression_control    0.296    0.0189      15.7 4.68e-34
3 ef_money                 0.318    0.0315      10.1 7.74e-19
glance(model4)
# A tibble: 1 x 12
  r.squared adj.r.squared sigma statistic  p.value    df logLik   AIC   BIC
      <dbl>         <dbl> <dbl>     <dbl>    <dbl> <dbl>  <dbl> <dbl> <dbl>
1     0.774         0.771 0.516      272. 4.38e-52     2  -121.  251.  263.
# … with 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>
slope3 <- model3$coefficients["ef_money"]
slope3
 ef_money 
0.5038764 
slope4 <- model4$coefficients["ef_money"]
slope4
 ef_money 
0.3177051 
rsquared3 <- summary(model3)$r.squared
rsquared3
[1] 0.4252422
rsquared4 <- summary(model4)$r.squared
rsquared4
[1] 0.7740638

From comparing slope3 and slope4 one can tell that the additional exposure to hf_score and ef_money leads to an decrease in the ef_money slope from 0.504 to 0.352

When looking at the rsquared value: it increase from 0.425 to 0.756 indicating that ef_money can be a confounder

-> for identifying cofounders, a change in percentage of from one model to the other is required to be > 10% which is the case here, thus suggesting ef_money being a confounder

In general, when combining the variables the variance doesn’t increase as much as expected

Exercise 11:

#for model5: pf_expression control as outcome, ef_money as exposures

model5 <- lm(pf_expression_control ~ ef_money + region, data = hfi_2016)


summary(model5)

Call:
lm(formula = pf_expression_control ~ ef_money + region, data = hfi_2016)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.4657 -0.9607  0.0183  1.0914  3.2730 

Coefficients:
                                    Estimate Std. Error t value Pr(>|t|)    
(Intercept)                          -0.6623     1.1658  -0.568 0.570811    
ef_money                              0.3721     0.1069   3.480 0.000656 ***
regionEast Asia                       2.8339     0.9678   2.928 0.003938 ** 
regionEastern Europe                  3.2402     0.7708   4.204 4.48e-05 ***
regionLatin America & the Caribbean   2.9607     0.7609   3.891 0.000149 ***
regionMiddle East & North Africa      0.3912     0.7862   0.498 0.619549    
regionNorth America                   4.6724     1.3693   3.412 0.000827 ***
regionOceania                         4.8369     1.0843   4.461 1.59e-05 ***
regionSouth Asia                      1.5715     0.8029   1.957 0.052145 .  
regionSub-Saharan Africa              2.1917     0.7443   2.945 0.003745 ** 
regionWestern Europe                  5.4289     0.7907   6.866 1.61e-10 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.673 on 151 degrees of freedom
Multiple R-squared:  0.514, Adjusted R-squared:  0.4818 
F-statistic: 15.97 on 10 and 151 DF,  p-value: < 2.2e-16
tidy(model5)
# A tibble: 11 x 5
   term                                estimate std.error statistic  p.value
   <chr>                                  <dbl>     <dbl>     <dbl>    <dbl>
 1 (Intercept)                           -0.662     1.17     -0.568 5.71e- 1
 2 ef_money                               0.372     0.107     3.48  6.56e- 4
 3 regionEast Asia                        2.83      0.968     2.93  3.94e- 3
 4 regionEastern Europe                   3.24      0.771     4.20  4.48e- 5
 5 regionLatin America & the Caribbean    2.96      0.761     3.89  1.49e- 4
 6 regionMiddle East & North Africa       0.391     0.786     0.498 6.20e- 1
 7 regionNorth America                    4.67      1.37      3.41  8.27e- 4
 8 regionOceania                          4.84      1.08      4.46  1.59e- 5
 9 regionSouth Asia                       1.57      0.803     1.96  5.21e- 2
10 regionSub-Saharan Africa               2.19      0.744     2.94  3.74e- 3
11 regionWestern Europe                   5.43      0.791     6.87  1.61e-10
ggplot(data = hfi_2016, aes(x = ef_money, y = pf_expression_control))+ 
  geom_point(colour="darkgreen") + 
  ggtitle(label="Linear regression model with ef_money and pf_expression_control")+
  geom_smooth(method = "lm")+
  theme(
  panel.border = element_blank(),
  panel.grid.major = element_blank(),
  panel.grid.minor = element_blank(),
  panel.background=element_blank(),
  axis.line = element_line(colour = "grey")
  )
`geom_smooth()` using formula 'y ~ x'

This scatterplot shows that ef_money is a confounder between pf_expression_control as exposure and hf_score as outcome.


sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.4

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] de_DE.UTF-8/de_DE.UTF-8/de_DE.UTF-8/C/de_DE.UTF-8/de_DE.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] forcats_0.5.1       stringr_1.4.0       dplyr_1.0.5        
 [4] purrr_0.3.4         readr_1.4.0         tidyr_1.1.3        
 [7] tibble_3.1.0        ggplot2_3.3.3       tidyverse_1.3.0    
[10] broom_0.7.4         openintro_2.0.0     usdata_0.1.0       
[13] cherryblossom_0.1.0 airports_0.1.0     

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.6        lattice_0.20-41   lubridate_1.7.9.2 assertthat_0.2.1 
 [5] rprojroot_2.0.2   digest_0.6.27     utf8_1.1.4        R6_2.5.0         
 [9] cellranger_1.1.0  backports_1.2.1   reprex_1.0.0      evaluate_0.14    
[13] highr_0.8         httr_1.4.2        pillar_1.5.1      rlang_0.4.10     
[17] readxl_1.3.1      rstudioapi_0.13   whisker_0.4       Matrix_1.2-18    
[21] rmarkdown_2.6     labeling_0.4.2    splines_4.0.3     munsell_0.5.0    
[25] compiler_4.0.3    httpuv_1.5.5      modelr_0.1.8      xfun_0.20        
[29] pkgconfig_2.0.3   mgcv_1.8-33       htmltools_0.5.1.1 tidyselect_1.1.0 
[33] workflowr_1.6.2   fansi_0.4.2       withr_2.4.1       crayon_1.4.1     
[37] dbplyr_2.1.0      later_1.1.0.1     grid_4.0.3        nlme_3.1-149     
[41] jsonlite_1.7.2    gtable_0.3.0      lifecycle_1.0.0   DBI_1.1.1        
[45] git2r_0.28.0      magrittr_2.0.1    scales_1.1.1      cli_2.3.1        
[49] stringi_1.5.3     farver_2.1.0      fs_1.5.0          promises_1.1.1   
[53] xml2_1.3.2        ellipsis_0.3.1    generics_0.1.0    vctrs_0.3.6      
[57] tools_4.0.3       glue_1.4.2        hms_1.0.0         yaml_2.2.1       
[61] colorspace_2.0-0  rvest_0.3.6       knitr_1.31        haven_2.3.1